Influences of chemotype and parental genotype on metabolic fingerprints of tansy plants uncovered by predictive metabolomics

Intraspecific plant chemodiversity shapes plant-environment interactions. Within species, chemotypes can be defined according to variation in dominant specialised metabolites belonging to certain classes. Different ecological functions could be assigned to these distinct chemotypes. However, the roles of other metabolic variation and the parental origin (or genotype) of the chemotypes remain poorly explored. Here, we first compared the capacity of terpenoid profiles and metabolic fingerprints to distinguish five chemotypes of common tansy (Tanacetum vulgare) and depict metabolic differences. Metabolic fingerprints captured higher variation in metabolites while preserving the ability to define chemotypes. These differences might influence plant performance and interactions with the environment. Next, to characterise the influence of the maternal origin on chemodiversity, we performed variation partitioning and generalised linear modelling. Our findings revealed that maternal origin was a higher source of chemical variation than chemotype. Predictive metabolomics unveiled 184 markers predicting maternal origin with 89% accuracy. These markers included, among others, phenolics, whose functions in plant-environment interactions are well established. Hence, these findings place parental genotype at the forefront of intraspecific chemodiversity. We recommend considering this factor when comparing the ecology of various chemotypes. Additionally, the combined inclusion of inherited variation in main terpenoids and other metabolites in computational models may help connect chemodiversity and evolutionary principles.

The evolution of plant metabolism has attracted the curiosity of scientists for decades [1][2][3] . Chemical diversity in plants is abundant, while explanations of its role and origin are not fully elucidated, especially regarding intraspecific chemodiversity 4,5 . Previous studies revealed high intraspecific chemodiversity in various species and linked metabolic variation to considerable ecological consequences [6][7][8] . In some plant species expressing high intraspecific chemodiversity, individuals can be classified into distinct chemotypes according to the occurrence and ratio of individual compounds belonging to a specific major metabolite class such as terpenoids for aromatic plants, glucosinolates for Brassicaceae or pyrrolizidine alkaloids for Asteraceae [9][10][11] . The strategy of discriminating plants according to their chemotypes can reveal interesting information and improve our comprehension of the ecology and evolution of intraspecific chemodiversity [10][11][12] . For instance, slugs show distinct preferences for certain Solanum dulcamara (Solanaceae) chemotypes, which are determined by their glycoalkaloid composition 13 . Evolution in the cardenolide profile could be linked to the surrounding biotic pressure and confer various toxic properties in Erysimum (Brassicaceae) species 14 . Moreover, high intraspecific diversity in plant chemotypes may be crucial for invasion success and different chemotypes may show distinct geographic distribution 15,16 . However, these chemotypes may capture a significant fraction of intraspecific chemodiversity but do not fully cover the chemodiversity blend. In fact, other pivotal metabolites or metabolite classes, here called satellites, may differ between chemotypes and have distinct ecological functions 8,17 . These satellite metabolites therefore refer to all www.nature.com/scientificreports/ those metabolites that are not used to define the chemotype, but which may play a major role in the development of the plant or in its interactions with the environment 8,17 . The correlation between satellite metabolites and the main compounds that determine chemotypes has rarely been looked at, although the assumed restriction to only few metabolites, belonging to a major chemical class, to define chemotypes could be a source of misinterpretation. For example, interactions between plants and other organisms may not be purely guided by the main metabolites defining a chemotype, but also by the chemodiversity of additional compounds 8 . Furthermore, chemotypes are heritable 10,18 , but the inheritance does not always follow Mendelian laws 19,20 . In addition, plants of a given chemotype can have distinct genetic backgrounds 19 . Thus, the parental genotype (i.e. parental origin) might also be responsible for a substantial part of chemodiversity. In this scenario, extending the research from main chemical patterns (defining chemotypes) to potential chemical variation inferred by parental genotype may considerably increase our understanding of intraspecific chemodiversity. This approach may also contribute to deciphering the genetic laws governing chemodiversity inheritance. Large scale metabolomics analyses of highly chemodiverse species may help to characterise the nature of satellite metabolic diversity within chemotypes and explore the impacts of the parental genotype on the metabolic variation.
Common tansy (Tanacetum vulgare L., also known as Chrysanthemum vulgare (L.) Bernh. Asteraceae), possesses an astonishing intraspecific chemodiversity 8 . Tansy chemotypes are defined by their dominant terpenoid(s), which can contribute 41-99% of the leaf total terpenoid profile 1,8 . Mono-chemotypes contain one dominant terpenoid (more than 50%) and commonly found examples in the field are the β-thujone chemotype or the camphor chemotype 1,6,8 . The mixed-chemotypes comprise two to three dominant terpenoids. Next to these dominant terpenoids, tens of further terpenoids can be found in both mono-and mixed chemotypes, contributing to the full terpenoid bouquet. These different chemotypes can co-occur and up to 14 distinct chemotypes were found in a rural area of just a few square kilometers 6 . Previous studies highlighted the consequences of this intraspecific chemodiversity on insect behaviour and performance as well as chemotype-specific differences in chemical responses to herbivory and abiotic constraints 12,[21][22][23] . Offspring individuals of one mother plant vary in terpenoid profiles and other chemical classes since tansy is outcrossing 19,24 . Hence, tansy represents an ideal study system to test the nature of satellite variation within chemotypes and investigate whether the parental genotype influences intraspecific chemodiversity.
To meet these objectives, terpenoid analyses using gas chromatography-mass spectrometry (GC-MS) as well as untargeted metabolic fingerprinting using ultra high performance liquid chromatography coupled to a quadrupole time-of-flight-MS (UHPLC-QTOF-MS/MS) were performed on leaves collected from five chemotypes of tansy plants that had been grown in a semi-field experiment. The chemotypes were derived from different maternal genotypes. Generalised linear models (GLMs) were conducted to test (i) whether metabolic features detected by untargeted metabolic fingerprinting were as relevant as the key terpenoid profiles to predict chemotypes and (ii) whether the maternal origin, mirroring most likely a certain genotype, significantly affected the metabolome of tansy. The predictive capacity of certain metabolic markers was validated by resampling plants in the field and clones of these plants grown in the greenhouse. Potential consequences of our findings on the interpretation of (chemo-)ecological experiments are discussed.

Results
Tansy chemotypes are mainly defined by quantitative variation in their metabolome. As a first step to characterise the variation in satellite metabolites, we captured the leaf terpenoid profiles (GC-MS) and metabolic fingerprints (UHPLC-QTOF-MS/MS) of five tansy chemotypes (181 samples) obtained from fourteen maternal genotypes (i.e. four to eight maternal origins per chemotype) and grown in a semi-field common garden (for analytical workflow see Fig. 1). The chemotypes included two mono-chemotypes, namely "Keto" and "BThu", as well as three mixed-chemotypes, "ABThu", "Aacet" and "Myrox", which were defined according to the dominant terpenoids (see Materials and Methods section). In the field, plants were either grown in homogenous plots with the same chemotype, or in heterogenous plots with all five distinct chemotypes. Leaves without vis- www.nature.com/scientificreports/ ible damage were sampled for chemical analyses in June 2021. In total, 52 compounds (mostly terpenoids) were detected by GC-MS, while untargeted LC-MS analyses yielded 5,066 features after pre-processing (Tables S1 &  S2). Growth conditions (i.e. plants grown in homogenous or heterogenous plots) did not show an effect on the terpenoid profiles (Fig. S1). In contrast, the five chemotypes were distinguishable by a PCA and displayed interesting chemical patterns based on terpenoid profiles and/or LC-MS features (Fig. 2). The major discriminant terpenoids were congruent with the chemotype definition 6,19 and included α-and β-thujone, artemisia ketone, artemisyl acetate and artemisia alcohol, and santolina triene and (Z)-myroxide. Next, we questioned whether this discriminatory ability was due to qualitative and/or quantitative variation. Notably, 79% of the features detected using both GC-MS and LC-MS occurred in all five chemotypes and 97% of the features were detected in more than one chemotype (Table S3). A rather high quantitative variation was observed, with 39 GC-MS and 809 LC-MS features differing significantly in abundance among chemotypes (Tukey's test, P < 0.05, FDR correction) ( Fig. 2b and c). Taken as a whole, these results suggested that the www.nature.com/scientificreports/ distinction between tansy chemotypes more likely resulted from quantitative variation rather than from differences in the occurrence of specific compounds.

Major terpenoid profiles and metabolic fingerprints show comparable chemotype predictive performance.
To compare the predictive capacity of either terpenoid profile (i.e. GC-MS) or metabolic fingerprint (i.e. LC-MS), we employed generalised linear models (GLMs) dividing the sample set into a "training" set, a "validation" set and a "test" set. Chemotypes were considerably predictable (Fig. 3a). Significant GC-MS or LC-MS features resulted in an average accuracy of 95% and 93%, respectively. The selection of the most predictive markers, which refer to the most discriminant and predictive features, was carried out based on their occurrence in the 500 models. The predictive performance of the 39 best LC-MS predictors was then compared to that of the 39 GC-MS predictors (significant terpenoids) (Tab. S4). Subsequently, the seven terpenoids used to define the five tansy chemotypes chosen for our study were tested (i.e. α-and β-thujone, artemisia alcohol, artemisia ketone, artemisyl acetate, (Z)-myroxide and santolina triene). Chemotypes were predicted at 95%, 98% and 97% accuracy using the 39 significant terpenoids, the 7 chemotype-defining terpenoids or the best 39 LC-MS markers, respectively (Fig. 3a). The GLM-based modelling approach was statistically validated by evaluating the likelihood of spurious prediction using 500 permuted datasets where chemotypes were randomly permuted between samples, yielding a 5% accuracy. To test the robustness of the LC-MS markers, we used a complemen- to predict their chemotype. The average prediction accuracy of 90% demonstrated the robustness of the markers across years, seasons and growing conditions (Fig. 3a). Next, we analysed the intersections between the top 50 LC-MS markers for each chemotype (Tab. S4). The best predictors strongly differed among chemotypes (Fig. 3b). However, their status of serving as best markers did not lie in their exclusivity but rather in variation in their abundance, since only 11% of the markers were specific to one chemotype ( Fig. S2 & Tab. S3). We then explored the relationships between the best 39 LC-MS markers and the 7 terpenoids used to define our five chemotypes, yielding strong correlations (Fig. S3). Overall, these results demonstrated that both terpenoid profiles and metabolic fingerprints could be used to predict tansy chemotypes efficiently, questioning the chemical nature of the best predictive LC-MS markers.
Major terpenoids as the main predictors of tansy chemotypes, which are further defined by satellite metabolic variation. To gain further insight into the biochemical pathways characterising tansy chemotypes, we putatively annotated the top 50 LC-MS markers per chemotype using MS and MS/MS spectra. The MSI annotation level for each marker is presented in Tab. S5. The majority of the best predictors were putatively assigned as specialised (secondary) metabolites (Fig. 3c). As expected, terpenoids were overrepresented among the markers and included putative diterpenoids and terpenoid derivatives (32 compounds each), as well as sesqui-, tri-and monoterpenoids. Further markers were putatively assigned to other biochemical pathways such as phenolics (18 compounds, 9% of the best markers). Primary metabolites such as fatty acyls and carbohydrate derivatives were also found (Fig. 3c). Overall, while this analysis supported the central place of major terpenoids to define chemotypes of tansy, findings highlighted variation in other metabolites within chemotypes that should be considered in further studies.
Predictive metabolomics sheds light on the influence of maternal origin on metabolic fingerprints. While chemotypes could be used to classify tansy samples efficiently, the parental genotype may also significantly impact intraspecific chemodiversity. To test this hypothesis, we performed variation partitioning on both GC-MS and LC-MS metabolic datasets (Fig. 1). Chemical variation in tansy metabolism was influenced by both chemotype and maternal origin (i.e. genotype) using either GC-MS or LC-MS data ( Fig. 4a and b). The terpenoid profile was mainly determined by the chemotype (64.7%) and only 3.2% of the variation was exclusively explained by the maternal genotype. Conversely, the maternal genotype explained 17.3% of the chemical variation in metabolic fingerprints independently of the chemotype. Chemotype explained 1.4% of the variation. In total, 41 (out of 52) terpenoids and 3,688 (out of 5,066) LC-MS features displayed significant differences among the 14 maternal genotypes (Tukey's tests, P < 0.05, FDR correction) (Fig. S4). To further characterise the impact of maternal genotype on tansy metabolism, we employed GLMs. First, a clustering analysis was used to classify the 14 maternal genotypes into four main classes to allow for GLM analyses by increasing the number of samples per class (Fig. S4). While GC-MS features could hardly predict maternal genotypes, the top 5% predictive LC-MS markers (i.e. 184 markers) predicted the maternal genotype with 89% accuracy. Models were statistically validated using 500 permuted datasets (Fig. 4c). Additionally, the predictive performance of the best markers was biologically validated using a complementary set. Model equations were defined on samples collected in June 2021 and directly applied to the complementary set composed of 20 plants harvested in the field in October 2022, yielding an average accuracy of 69% (Fig. 4c). While this result underlined the predictive capacity of the best markers across years and seasons, the 20% delta between 2021 and 2022 predictions could be explained by (i) a low number of samples per class in the complementary set, which increased error weight on tested samples and/or (ii) a slightly different abundance of these markers between seasons. Predictions in permutation tests (accuracy of 30%) were ascribed to the low number of samples per maternal genotype class in the complementary set. Thus, these results highlighted the strong influence of maternal genotype on the tansy metabolome. Besides, the 184 LC-MS chemical markers predicting maternal genotype with 89% accuracy highly differed from the chemical markers predicting chemotypes (Tab. S5). This result supports the hypothesis that chemotypes only capture part of the entire intraspecific chemodiversity.
To test the influence of maternal genotype on the tansy metabolome within chemotypes, we employed partial least squares discriminant analyses. While GC-MS data only slightly discriminated maternal genotypes within chemotypes, a clear distinction was found using LC-MS data (Figs S5, S6). Hence, maternal genotype significantly influenced the overall intraspecific chemodiversity and represented a significant source of chemical variation within chemotypes.
Chemical variation in primary and specialised metabolism among maternal genotypes. To define the main metabolic pathways impacted by maternal genotype, we putatively annotated the best 184 LC-MS markers (i.e. top 5%) (Tables S5, S6). Maternal genotype influenced both primary and specialised metabolisms (Fig. 4d). Phenolics were the most affected class (29%), including flavonoids (27 compounds) such as quercetin and derivatives of quercetin, kaempferol, luteolin and naringenin (Table S5). To a lower extent, terpenoids, fatty acyls and carbohydrate derivatives were also impacted. Moreover, cinnamic acid derivatives, benzenoids and nitrogen-related compounds were represented among the best markers (Fig. 4d, Table S5). These findings highlighted the strong metabolic variation induced by the maternal genotype, including variation in major classes such as flavonoids, which could in turn lead to significant ecological consequences.

Discussion
Untargeted metabolomics as a complementary strategy to study intraspecific chemodiversity. The analysis of intraspecific chemodiversity offers promising perspectives to improve our comprehension of the evolution of plant specialised metabolism and ascribe ecological functions to specific chemical traits 4 . In this context, highly chemodiverse species are often classified into chemotypes based on prominent and dominant specialised metabolites, such as terpenoids in tansy 6 . Terpenoid chemotypes have also been described in numerous weed species, such as Solidago gigantea (Asteraceae) and plants used as spices or for medical purposes, such as Thymus vulgaris (Lamiaceae) or Cannabis sativa (Cannabaceae) [25][26][27] . These examples highlight the fascinating chemical polymorphism that can be found even within species. Different chemotypes may show differential invasion success 26,28 and are exposed to differential selection by various herbivores, such as aphids or slugs 13,29 . Thus, the classification in chemotypes can offer highly valuable insights. However, this strategy restricts our comprehension of chemotype consequences for ecological interactions to a specific compound class 8 . Strikingly, our untargeted analysis showed that changes in the main terpenoid profiles were accompanied by a significant variation in numerous satellite metabolites, which may also be of ecological importance. First, the analysis of the GC-MS data confirmed the capacity of the terpenoid profile to define chemotypes. Notably, chemotypes BThu and ABThu were not clearly discriminated using an unsupervised statistical method on GC-MS data. The main terpenoid profile may change somewhat during development and responds to the environment when plants develop under field conditions, resulting in a slight shift and thus less clear-cut separation of these chemotypes compared with chemotyping carried out at the seedling stage. However, these chemotypes were well predicted in supervised analyses, showing that they maintained some characteristic metabolic variation. Besides, the terpenoid profiles were not impacted by growing conditions in homogenous or heterogenous plots, pointing to the consistent expression of certain chemotypes. GLMs displayed a comparable performance www.nature.com/scientificreports/ of GC-MS and LC-MS data to predict chemotypes, placing LC-MS analysis as a valuable alternative to define chemotypes. Nevertheless, compared to LC-MS analyses, which may be supplemented by measurements in positive electrospray ionisation mode, GC-MS analyses represented a more efficient strategy to depict the terpenoid profile. Besides, the terpenoid profile remained the best predictor for chemotypes. However, in contrast to GC-MS, LC-MS measurements captured the importance of variation in satellite metabolites in defining chemotypes. Prominent chemical variation in the tansy chemotypes was found, for example in phenolics, which was consistent with the chemodiversity recently described in this compound class among Chrysanthemum species 30 . Additional variation was observed in fatty acyls and carbohydrate derivatives, which is congruent with previous studies on tansy 8,31 .
Analysing variation in satellite metabolites may be of relevance for various objectives in the research area of intraspecific chemodiversity. First, these satellite metabolites need to be taken into consideration when ascribing functional roles, such as acting as a repellent or deterrent to herbivores, to a certain chemotype, as functions should not be assigned based on the dominant terpenoid(s) only 8 . For instance, phenolics also showed protective functions against aphids 32 . Second, satellite chemical variation may be of importance when exploring intraindividual chemodiversity among organs or tissues. For example, while tansy chemotypes are usually discriminated based on their leaf terpenoids, flowers may show slightly distinct terpenoid patterns and have been found to explain preference and abundance patterns of pollinators and florivores 12,33 . The content of other chemical classes of flower parts, such as proteins and lipids of pollen, did not necessarily differ among different tansy leaf chemotypes 12 . Besides, leaf chemotypes and phloem sap chemistry were only partially linked in tansy plants 22 while root and leaf terpenoid profiles showed mostly uncorrelated patterns 34 . The analysis of other metabolites or classes of metabolites apart from the major chemotype-determining terpenoids clearly benefits our knowledge of chemodiversity within plant individuals. Third, further investigation of satellite chemodiversity may help deciphering the genetic rules governing the inheritance of intraspecific chemodiversity. Thereby, the inheritance may differ for different biosynthetic pathways such as terpenoids and phenolics 35,36 . Finally, exploring the correlations between terpenoids and other specialised or also primary metabolites may support the development of computational models that aim to link chemodiversity and evolutionary principles 37,38 . Impacts of maternal origin on intraspecific chemodiversity and potential ecological consequences. Variation partitioning indicated that variation in tansy metabolism was not exclusively driven by chemotype but rather derived from maternal genotype (i.e. maternal origin). Chemical variation in the tansy metabolome was explained at 18% by maternal genotype and 184 markers predicted this parameter with 89% accuracy. In other words, the chemotype was responsible for only certain facets of intraspecific chemodiversity in tansy. The additional chemical variation inferred from the maternal genotype could have significant consequences. For instance, terpenoids, phenolics, benzenoids and fatty acyls were among the most represented metabolite classes within the best markers for maternal genotypes. The role of terpenoids in the attraction from the distance of herbivores, their natural enemies and pollinators is well established 39,40 . These compounds also served plant fitness by acting as repellents or defensive compounds against several antagonists 41,42 . Several other compounds can influence herbivore performance. For instance, phenolic glycosides displayed defensive functions against generalist herbivores 43 . Similar effects have been reported for certain cinnamic acid derivatives and tannins 44 . Besides, the occurrence of several flavonoids and other phenolics, such as quercetin, kaempferol, luteolin and naringenin derivatives as well as caffeoylquinic acids, is in agreement with previous reports on tansy biochemistry 45,46 . Flavonoids can also influence the behaviour of belowground organisms by either conferring stimulatory or deterrent properties [47][48][49] . In contrast, benzenoids have mostly been recognised for their role in pollinator attraction 50 . Furthermore, several metabolites found here as markers for maternal genotypes, such as terpenoids and fatty acyls, are known to affect plant responses to herbivory under challenging abiotic conditions 21,51 .
Moreover, the effect of the maternal genotype was not only observable at the metabolic fingerprint scale, but also within chemotypes of tansy. Hence, the maternal genotype represents at least partially the observed satellite metabolic variation within chemotypes. This observation can be supported by the fact that the reliability of genotype in predicting chemotype depends on the organism 52 . In addition, a given chemotype may arise from distinct parental genotypes. Since the inheritance of the chemotype in tansy and other species is assumed to be complex and defined by a combination of genes 19,20 , the concentrations of major compounds such as terpenoids can vary according to this genetic combination and thus be distinguished between parental genotypes within a given chemotype. Moreover, the transmission of genes allowing chemotype inheritance could be accompanied by additional genetic information governing other metabolic patterns, which could be distinguished within tansy chemotypes 8,19 .

Conclusion
Overall, our predictive untargeted metabolomics approach highlighted that (i) variation in terpenoid contents within chemotypes was accompanied by significant satellite metabolic variation and (ii) maternal genotype was a stronger driver of intraspecific chemodiversity than chemotype. Multiple consequences of this additional metabolic variation thus need to be considered, as discussed above, sensitising researchers to consider parental genotype effects when working with highly chemodiverse species. Analysing the parental genotype effect on a wider range of chemotypes and assessing the ecological consequences of satellite metabolic variation are exciting perspectives for intraspecific chemodiversity research.

Materials and methods
Plant stock. Seeds of tansy were collected in the surroundings of Bielefeld (Germany) from different maternal plants grown at a distance of at least 20 m, assuming that these are different maternal genotypes. Seeds were germinated and leaf terpenoid profiles were determined from offspring by GC-MS to assign the chemotypes based on the dominating terpenoid or terpenoids. From these offspring, plants of five chemotypes were selected for further experiments, namely two mono-chemotypes, "Keto" and "BThu", which had as dominant terpenoid (more than 50%) either artemisia ketone or β-thujone, respectively, and three mixed-chemotypes, which were dominated by either α-thujone and β-thujone ("ABThu"), or artemisyl acetate, artemisia ketone and artemisia alcohol ("Aacet") or (Z)-myroxide, santolina triene and artemisyl acetate ("Myrox"), as previously described 1,8,53 . These plants were derived from the seeds of in total 14 different maternal plants. Overall, each chemotype was derived from four up to eight different maternal plants, and from each maternal plant, we used two to three different chemotypes. In the end, we had 150 different chemo-genotypes, which were kept as "stock" in a greenhouse since the end of 2019. Forest, USA). The GC injection port was kept at 240 °C. The GC temperature program started at 50 °C kept for 5 min, increased to 250 °C at a rate of 10 °C min -1 and further increased to 280 °C at a rate of 30 °C min −1 , which was held for 3 min. Electron ionisation at 70 eV was applied. An alkane standard mix (C7-C40, Sigma Aldrich) was analysed with the same method to determine the retention indices of the terpenoid analytes 55 . Terpenoid identification was performed by comparing mass spectra and retention indices to chemical references and chemical databases as previously described 53 .

Field experiment and leaf harvest.
For UHPLC-QTOF-MS/MS (UHPLC: Dionex UltiMate 3000, Thermo Fisher Scientific, San José, CA, USA; QTOF: compact, Bruker Daltonics, Bremen, Germany), the separation was performed on a Kinetex XB-C18 column (150 × 2.1 mm, 1.7 μm, with guard column; Phenomenex) at 45 °C and a flow rate of 0.5 mL min −1 using a gradient from eluent A, i.e. Millipore-H 2 O with 0.1% formic acid (FA), to eluent B (acetonitrile with 0.1% FA): 2 to 30% B within 20 min, increase to 75% B within 9 min, followed by column cleaning and equilibration, as described in Schweiger et al. (2021) 54 . The QTOF was operated in negative electrospray ionisation (ESI) mode at a spectra rate of 6 Hz in the m/z (mass-to-charge) range of 50-1300. The settings for the MS mode were: end plate offset 500 V, capillary voltage 3,000 V, nebulizer (N 2 ) pressure 3 bar, dry gas (N 2 ; 275 °C) flow 12 L min −1 , low mass 90 m/z, quadrupole ion energy 4 eV, collision energy 7 eV. The AutoMS/MS mode was used to obtain MS/MS spectra ramping the isolation width and collision energy along with increasing m/z. Additional MS/ MS analyses of some samples were performed to target selected ions (i.e. best markers) using multiple reaction monitoring (MRM). A calibration solution with sodium formate was used for the recalibration of the m/z axis. For the samples collected in 2021, raw LC-MS data were processed via DataAnalysis (v. 4.4, Bruker Daltonics) using optimised parameters, which included signal-to-noise ratio 3, correlation coefficient threshold 0.75, minimum compound length 20, smoothing width 3. Bucketing was applied to sort features belonging to the same metabolites (i.e. common adducts). For each bucket, the feature with the highest intensity was used for quantification based on the peak height in MS mode and only these features were included in the final dataset. Features in the retention time (RT) range 1.25 -29 min (i.e. excluding the injection peak) were aligned across samples (ProfileAnalysis v. 2.3, Bruker Daltonics), allowing RT shifts of 0.1 min and m/z shifts of 6 mDa. Only features whose mean intensity was at least 50 times higher than in blanks and which occurred in at least two samples were retained in the dataset. For samples harvested in 2022, processing of the LC-MS data were done following the same steps and using similar parameters on the T-ReX 3D algorithm of MetaboScape (v. 2021b, Bruker Daltonics). Settings included the presence of features in minimum 3 samples, correlation coefficient www.nature.com/scientificreports/ threshold (ESI correlation) 0.8, intensity (peak height) threshold 1,000, minimum peak length 11. For both tables, features were normalised by dividing the peak heights by the height of the hydrocortisone [M + HCOOH-H] − ion (chemical standard) and feature intensities were divided by the sample dry weights. GC-MS and LC-MS datasets were normalised by median normalisation, cube-root transformation and Pareto scaling before statistical analyses. Data normality was checked in MetaboAnalyst (v. 5.0) 56 . The non-normalised datasets as well as feature and sample metadata are available as supplemental data (Tables S1, S2, and S8).

Generalised multilinear models (GLMs).
To test the capacity of terpenoid profiles and metabolic fingerprints to predict the chemotype and maternal genotype, GLMs were conducted in R (v. 4.2.1) using the glmnet package 57 , as previously described 58 . Briefly, multinomial models were developed by testing a thousand penalty values of elastic net (ranging from 0 to 1) for variable selection. Stratified sampling was used to uniformly divide the sample set into training (60%), validation (20%) and testing (20%) sets. To cope with the random partitioning, 500 models were performed for each test. Cross-validation was applied to limit overfitting and prediction accuracy was defined on the real predictions performed on the test set. To statistically validate the models, the likelihood of spurious predictions was estimated using 500 permuted datasets, in which either chemotypes or maternal genotypes were randomly assigned to samples. Variable selection was performed using variable occurrence in the 500 models 58 to select the best markers. Model performance (i.e. prediction accuracy) was compared using Student t-tests. Finally, biological validation was performed using the complementary validation set. The equation of the model was calculated using the initial dataset (plants collected in the field in June 2021) and directly applied to the samples of the complementary set (collected in October 2022) to predict the chemotype or maternal genotype of these samples. The likelihood of spurious prediction was again defined using 500 permuted datasets.

Annotation of LC-MS features.
Putative molecular formulas of the best LC-MS markers were defined using SmartFormula and SmartFormula 3D in MetaboScape (v. 2021b) including N, O, C, H, S, Cl and P elements. The most likely metabolic candidates were selected based on the m/z deviation (Δppm) and were screened on chemical databases such as ChEBI, DNP (http:// dnp. chemn etbase. com) and KNApSAcK 67,68 . When available, MS/MS spectra were compared to MS/MS spectra from MassBank to improve annotation confidence 69 .
In addition, an in-house library was used to compare retention time, MS and MS/MS spectra. Confidence in the annotation level was defined according to the metabolomics standards initiative (MSI) confidence level 70 (Table S5). For MSI 4 level (lower confidence level), the putative chemical class was assigned according to the most represented chemical class (i.e. the most widely represented chemical class for a given molecular formula) and based on the literature on tansy. For this confidence level, a putative compound was assigned as a potential example. As most of the putative annotations did not reach the MSI 2 level, the analysis of the results was mainly performed at the chemical class level. Biochemical pathways and putative chemical classes were inferred using KEGG identifiers 71 and Classyfier (http:// class yfire. wisha rtlab. com).

Data availability
All data and metadata are available in supplemental tables. The non-normalised datasets, metadata as well as raw spectra are also publicly available on MassIVE (MSV000091314, https:// doi. org/ 10. 25345/ C50Z7 164V) in the sections "other", "metadata" and "raw", respectively. www.nature.com/scientificreports/